class: center, middle, inverse, title-slide # Describing Data Graphically with R ### S. Mason Garrison --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Describing Data --- ## Hans Rosling
--- # Summarize .pull-left[ - Transform a pile of numbers into a summary - Descriptive Statistics - Distribution of a variable is a table/graph showing the categories/values of outcomes and their frequency/percentage of occurrence - Exploratory Data Analysis (Tukey, 1977) ] .pull-right[  ] --- # Exploratory Data Analysis .pull-left[ - Tukey (1977) - EDA - Graphical Data Analysis - Numbers as summaries - Emphasized Robust Statistics ] -- .pull-right[  ] --- # Descriptive Statistics - Examples - Tables - Graphs - Summary Statistics --- # Tables - Woodbridge (1845)  --- # Graphs - Minard (1869)  --- # Examples - Summary Statistics - Measures of Central Tendency - Measures of Spread  --- # Categorical Variable Displays (Nominal, Ordinal) - Frequency Distribution Graphs - Bar Chart - Pie Chart - Quantitative Variables - Histograms - Stem plots - Time Plots --- # Frequency distribution graph .pull-left[ - Bar Chart - Graphs of variables with categories of outcomes on the x axis; and the frequency or percent of each category on the Y axis. ] .pull-right[  ] --- # Bar Graph/Chart .pull-left[ ```r # Bar chart library(car) counts <- table(mtcars$gear) ``` ] .pull-right[ ```r barplot(counts, main="Car Distribution", xlab="Number of Gears") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Stacked Bar Chart .pull-left[ ```r df <- data.frame( group = c("Male", "Female", "Child"), value = c(25, 25, 50) ) head(df) ``` ``` ## group value ## 1 Male 25 ## 2 Female 25 ## 3 Child 50 ``` ] .pull-right[ ```r library(ggplot2) (bp=ggplot(df, aes(x="",y=value, fill=group))+ geom_bar(width = 1, stat = "identity")) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Pie Chart .pull-left[ - Graphs of variables with categories of outcomes as frequency or percent of each category in the pie. ] .pull-right[  ] --- # Pie chart .pull-left[ ```r slices <- c(10, 12,4, 16, 8) lbls <- c("US", "UK", "Australia", "Germany", "France") ``` ] .pull-right[ ```r pie(slices, labels = lbls, main="Pie Chart of Countries") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Example 2 .small[ ```r mytable <- table(iris$Species) lbls <- paste(names(mytable), "\n", mytable, sep="") pie(mytable, labels = lbls, main="Pie Chart of Species\n (with sample sizes)") ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Convert Bar Chart into Pie Chart .small[ ```r pie <- bp + coord_polar("y", start=0) pie ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> ```r pie + scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9")) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-10-2.png" width="90%" style="display: block; margin: auto;" /> ] [Additional Resources](http://www.sthda.com/english/wiki/ggplot2-pie-chart-quick-start-guide-r-software-and-data-visualization) --- # Quantitative Variables - Interval or Ratio Scales - Histograms - Stem plots - Time plots --- # Histogram - A histogram is a graphical representation of the distribution of numerical data. - Approximates a probability distribution - First described in @Pearson1895. --- # Histogram .pull-left[ ```r library(MASS) variable<-cats$Bwt hist(variable) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-11-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ ```r variable<-variable*2.2 #Convert to Imperial hist(variable) ``` <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-12-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Stemplot - Sometimes called a stem and leaf diagram ```r # Stem and Leaf plot stem(faithful$eruptions,scale=1) ``` ``` ## ## The decimal point is 1 digit(s) to the left of the | ## ## 16 | 070355555588 ## 18 | 000022233333335577777777888822335777888 ## 20 | 00002223378800035778 ## 22 | 0002335578023578 ## 24 | 00228 ## 26 | 23 ## 28 | 080 ## 30 | 7 ## 32 | 2337 ## 34 | 250077 ## 36 | 0000823577 ## 38 | 2333335582225577 ## 40 | 0000003357788888002233555577778 ## 42 | 03335555778800233333555577778 ## 44 | 02222335557780000000023333357778888 ## 46 | 0000233357700000023578 ## 48 | 00000022335800333 ## 50 | 0370 ``` --- # Time Plots <img src="data:image/png;base64,#../img/minard.png" width="90%" style="display: block; margin: auto;" /> - Edward Tufte has said that Minard's plot: > "may well be the best statistical graphic ever drawn" - It packs a ton of information into one dense figure. --- # Time Plots <img src="data:image/png;base64,#../img/minard.png" width="90%" style="display: block; margin: auto;" /> --- - The plot contains six variables, each mapped to a different aesthetic: | Information | Aesthetic | |---------------------------------------|-----------------| | Size of Napoleon's Grande Armée | Width of path | | Longitude of the army's position | x-axis | | Latitude of the army's position | y-axis | | Direction of the army's movement | Color of path | | Date of points along retreat path | Text below plot | | Temperature during the army's retreat | Line below plot | --- # Recreation in R - This plot has been recreated in R by: - [Andrew Heiss](https://www.andrewheiss.com/blog/2017/08/10/exploring-minards-1812-plot-with-ggplot2/) - [Michael Friendly](http://www.datavis.ca/gallery/re-minard.php) - @Wickham2010 [link](https://www.tandfonline.com/doi/suppl/10.1198/jcgs.2009.07098?scroll=top) --- # Side by Side <img src="data:image/png;base64,#../img/minard.png" width="90%" style="display: block; margin: auto;" /> <img src="data:image/png;base64,#descriptive_files/figure-html/unnamed-chunk-17-1.png" width="90%" style="display: block; margin: auto;" /> --- # More Accessible Resources - [R Graph Catalog](http://shinyapps.stat.ubc.ca/r-graph-catalog/) - [intRo](http://www.intro-stats.com/) --- # R Basics - Installation - R can be downloaded from one of the mirror sites in http://cran.r-project.org/mirrors.html. You should pick your nearest location. - Using External Data - R offers plenty of options for loading external data, including Excel, Minitab and SPSS files. - R Session - After R is started, there is a console awaiting for input. At the prompt (>), you can enter numbers and perform calculations. > 1 + 2 [1] 3 --- # Variable Assignment We assign values to variables with the assignment operator "=". Just typing the variable by itself at the prompt will print out the value. We should note that another form of assignment operator "<-" is also in use. ``` ## [1] 1 ``` # Functions R functions are invoked by its name, then followed by the parenthesis, and zero or more arguments. The following apply the function c to combine three numeric values into a vector. ``` ## [1] 1 2 3 ``` --- # Comments All text after the pound sign "#" within the same line is considered a comment. > 1 + 1 # this is a comment [1] 2 --- # Extension Package Sometimes we need additional functionality beyond those offered by the core R library. In order to install an extension package, you should invoke the install.packages function at the prompt and follow the instruction. --- # > install.packages() Getting Help R provides extensive documentation. For example, entering ?c or help(c) at the prompt gives documentation of the function c in R. Please give it a try. --- # > help(c) If you are not sure about the name of the function you are looking for, you can perform a fuzzy search with the apropos function. > apropos("nova") [1] "anova" "anova.glm" .... # References --- # Wrapping Up... <br><br> 